HyDR-MI: A hybrid algorithm to reduce dimensionality in multiple instance learning
نویسندگان
چکیده
Feature selection techniques have been successfully applied in many applications for making supervised learning more effective and efficient. These techniques have been widely used and studied in traditional supervised learning settings, where each instance is expected to have a label. In multiple instance learning (MIL) each example or bag consists of a variable set of instances, and the label is known for the bag as a whole, but not for the individual instances it consists of. Therefore utilizing these labels for feature selection in MIL becomes less straightforward. In this paper we study a new feature subset selection method for MIL called HyDR-MI (Hybrid Dimensionality Reduction method for Multiple Instance learning). The hybrid consists of the filter component based on an extension of the ReliefF algorithm developed for working with MIL and the wrapper component based on a genetic algorithm that optimizes the search for the best feature subset from a reduced set of features, output by the filter component. We conducted an extensive experimental evaluation of our method on five benchmark datasets and seventeen classification algorithms for MIL. The results of our study show the potential of the proposed hybrid with respect to the desirable effect it produces: a significant improvement of the predictive performance of many MIL classification techniques as compared to the effect of filterbased feature selection. This is achieved due to the possibility to decide how many of the top ranked features are useful for each particular algorithm and the possibility to discard redundant attributes.
منابع مشابه
ReliefF-MI: An extension of ReliefF to multiple instance learning
In machine learning the so-called curse of dimensionality, pertinent to many classification algorithms, denotes the drastic increase in computational complexity and classification error with data having a great number of dimensions. In this context, feature selection techniques try to reduce dimensionality finding a new more compact representation of instances selecting the most informative fea...
متن کاملEM-DD: An Improved Multiple-Instance Learning Technique
We present a new multiple-instance (MI) learning technique (EMDD) that combines EM with the diverse density (DD) algorithm. EM-DD is a general-purpose MI algorithm that can be applied with boolean or real-value labels and makes real-value predictions. On the boolean Musk benchmarks, the EM-DD algorithm without any tuning significantly outperforms all previous algorithms. EM-DD is relatively ins...
متن کاملLearning from Data with Complex Interactions and Ambiguous Labels
In this thesis, we develop and evaluate machine learning algorithms that can learn effectively from data with complex interactions and ambiguous labels. The need for such algorithms is motivated by such problems as protein-protein binding and drug activity prediction. In the first part of the thesis, we focus on the problem of myopia. This problem arises when greedy learning strategies are appl...
متن کاملMICCLLR: A Generalized Multiple-Instance Learning Algorithm Using Class Conditional Log Likelihood Ratio
We propose a new generalized multiple-instance learning (MIL) algorithm, MICCLLR (multiple-instance class conditional likelihood ratio), that converts the MI data into a single meta-instance data allowing any propositional classifier to be applied. Experimental results on a wide range of MI data sets show that MICCLLR is competitive with some of the best performing MIL algorithms reported in li...
متن کاملG3P-MI: A genetic programming algorithm for multiple instance learning
This paper introduces a new Grammar-Guided Genetic Programming algorithm for resolving multi-instance learning problems. This algorithm, called G3P-MI, is evaluated and compared to other multi-instance classification techniques in different application domains. Computational experiments show that the G3P-MI often obtains consistently better results than other algorithms in terms of accuracy, se...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Inf. Sci.
دوره 222 شماره
صفحات -
تاریخ انتشار 2013